Low complexity optimal soft MIMO receiver

ABSTRACT

A low-complexity optimal soft MIMO detector is provided for a general spatial multiplexing (SM) systems with two transmit and N R  receive antennas. The computational complexity of the proposed scheme is independent from the operating signal-to-noise ratio (SNR) and grows linearly with the constellation order. It provides the optimal maximum likelihood (ML) solution through the introduction of an efficient Log-likelihood ratio (LLR) calculation method, avoiding the exhaustive search over all possible nodes. The intrinsic parallelism makes it an appropriate option for implementation on DSPs, FPGAs, or ASICs. In specific, this MIMO detection architecture is very suitable to be applied in WiMax receivers based on IEEE 802.16e/m in both downlink (subscriber station) and uplink (base station).

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of prior U.S. patent application Ser.No. 12/046,747, filed Mar. 12, 2008, now allowed, which is herebyincorporated by reference herein in its entirety.

TECHNICAL FIELD

The present disclosure pertains generally to the multiple-inputmultiple-output (MIMO) communication systems, and more particularly tothe techniques for optimal soft detection MIMO Receivers.

BACKGROUND

Multiple-input-multiple-output (MIMO) systems have received significantattention as a promising method for achieving large spectral efficiency,which makes it the technology of choice in many standards such asIEEE802.11n, IEEE802.16e/m, and IEEE 802.20. One of the main challengesin exploiting the potential of MIMO systems is to design low-complexity,high-throughput detection schemes, which are suitable for efficient VLSIrealization, to implement low-power MIMO receivers withnear-maximum-likelihood (ML) performance.

In spatial multiplexing (SM) multiple-input multiple-output (MIMO)schemes with N_(T) transmit and N_(R) receive antennas (N_(T)×N_(R)system), where N_(R)≧N_(T), N_(T) data streams are transmittedsimultaneously from N_(T) antennas. This results in an increase in thesystem's spectral efficiency by a factor of N_(T) assuming that the datastreams can be successfully decoded. The mathematical model for a SMMIMO system is:y=Hs+n  (1)

where y is a N_(R)×1 received vector, with N_(R) equal to the number ofreceive antennas, H is the N_(R)×N_(T) channel matrix, s is the transmitvector, and n is a N_(R)×1 received noise vector. The averagesignal-to-noise ratio (SNR) of all N_(T) streams has to be maintainedwithout increasing the total transmit power compared to single-antennasystems. In fact in order to achieve the maximum spectral efficiency,the interference resulted from simultaneous transmission of N_(T) datastreams has to be suppressed at the receiver using a MIMO detectionscheme. The optimum detector, achieving the full diversity order ofN_(R), is the maximum-likelihood (ML) detector, which finds thetransmitted symbol vector via solving the following optimizationproblem.ŝ=argmin_(s) ∥y−Hs∥ ².  (2)

where ŝ represents the optimal detected symbol at the receiver.

This optimization problem is computationally expensive to implementspecially for high-order constellation schemes and/or MIMO systems withlarge number of transmit antennas. This is because of the fact that itrequires an exhaustive search over all Q^(N) ^(T) possible inputvectors, where Q is the modulation level. For instance in a MIMO systemwith only two transmit antennas using the 64-QAM modulation scheme,there are total of 64²=4096 symbol vectors to search through. The maindownside of the ML detector is the fact that its complexity growsexponentially with the modulation level. Thus the goal is to design anoptimal detector with the exact ML performance, while having a linearcomplexity with respect to the modulation level and independent of theSNR and channel status.

On the other hand, the complexity of the exhaustive-search optimal MLdetection scheme grows exponentially with the number of transmitantennas. Therefore, lower-complexity suboptimal receivers are requiredto be developed in practical applications. The existing approaches usedto alleviate the high computational complexity of the ML detector fallinto the following two main categories:

Linear Receivers:

Zero-forcing and Minimum Mean Square Error (MMSE) receivers are the mostcommon low-complexity candidates, which are able to remove the spatialinterference between the transmitted data streams with a linearcomplexity. However, the achieved diversity order with a linear receiveris N_(R)−N_(T)+1. This means in a 2×2 MIMO system, there is no diversitygain, which results in a significant performance loss compared to the MLreceiver.

Suboptimal ML Receivers:

which are lower-complexity approximations of the ML detector with aclose-to-ML performance. The lower complexity is as a result of choosinga limited search space compared to the ML exhaustive search. As aconsequence, the optimal ML solution may not be included in the searchspace, which generates the performance loss. However, in general thesemethods outperforms the linear receivers. Depending on thenon-exhaustive search methodology, the suboptimal algorithms fall intotwo main categories, namely the depth-first methods, and breadth-firstmethods.

Sphere decoding (SD) is the most attractive depth-first approach whoseperformance is the same as ML under the assumption of unlimitedexecution time. However, the actual runtime of the algorithm depends notonly on the channel realization/status, but also on the operating SNR.Thus leading to a variable throughput rate resulting in an extraoverhead in the VLSI implementation due to the extra required I/Obuffers and lower hardware utilization.

Among the breadth-first search methods, the most well-known approach isthe K-Best algorithm. The K-Best algorithm guarantees a SNR-independentfixed-throughput detector with a performance close to the ML. Beingfixed-throughput in nature along with the fact that the breadth-firstapproaches are feed-forward detection schemes with no feedback, makesthem especially attractive for the hardware implementation. There hasbeen some efforts on the implementation of the K-Best algorithm,however, the K-Best algorithm consists of node expansion and sortingcores, which are both time-hungry and the bottleneck in the hardwareresulting in low-throughput architectures. Moreover, their performancealso deteriorates for high-SNR regimes.

Therefore, there is a crucial need for a detector, which has the optimalperformance of the ML detector, the high-speed feature of thedepth-first approaches, and the SNR-independent fixed-throughputarchitecture of the breadth-first schemes.

SUMMARY

The disclosure provides a low-complexity optimal soft MIMO detector fora general spatial multiplexing (SM) systems with two transmit and N_(R)receive antennas. The computational complexity of the proposed scheme isindependent from the operating signal-to-noise ratio (SNR) and growslinearly with the constellation order. It provides the optimal maximumlikelihood (ML) solution through the introduction of an efficientLog-likelihood ratio (LLR) calculation method, avoiding the exhaustivesearch over all possible nodes. The intrinsic parallelism makes it anappropriate option for implementation on DSPs, FPGAs, or ASICs. Inspecific, this MIMO detection architecture is very suitable to beapplied in WiMax receivers based on IEEE 802.16e/m in both downlink(subscriber station) and uplink (base station).

Thus, the present disclosure provides a method of performing alinear-complexity optimal soft Multiple-input-multiple-output (MIMO)detector in a 2×N_(R) system, the method comprising the steps of:calculating the first and second generator matrices using channelpre-processing based upon a channel matrix; applying the generatormatrices to a received vector to generate a first and a second modifiedreceived vectors wherein the first modified received vector comprises anoriginal transmitted vector and the second modified received vectorcomprises a flipped version of the original transmitted vector;selecting a first element and a second element of the transmitted vectoras child and parent symbols respectively; determining, for both thetransmitted vector and the flipped version of the transmitted vector,for each possible value of the parent symbol, a first child by mapping azero-forcing estimate of the child symbol to the nearest constellationpoint in an associated constellation scheme using the first and secondmodified received vectors; adding candidates to a candidate list fromthe determined parent symbol and it's first child symbol for each of thetransmitted vector and flipped version of transmitted vector; andcalculating log-likelihood ratios (LLRs) of all bits for each resultingvector.

Also provided is a method of A method of performing a linear-complexityoptimal soft Multiple-input-multiple-output (MIMO) detection for a2×N_(R) system, the method comprising the steps of: calculating a firstgenerator matrix using channel pre-processing based upon a channelmatrix; applying the first generator matrix to a received vector togenerate a first modified received vector; selecting, for a transmittedvector, a first element as a child symbol and a second element as aparent symbol; determining, for the transmitted vector, for eachpossible value of the parent symbol, a first child by mapping azero-forcing estimate of the child symbol to a nearest constellationpoint in an associated constellation scheme using the first modifiedreceived vector and the channel matrix; adding, for the transmittedvector, candidates to a candidate list of transmitted vectors from thedetermined parent symbol and its child symbol; calculatinglog-likelihood ratios (LLRs) of all bits for the parent symbol of thetransmitted vector; calculating a second generator matrix using thechannel pre-processing based upon the swapped version of the channelmatrix; wherein the swapped version of the channel matrix derived byswapping the columns of the channel matrix; applying the secondgenerator matrix to the received vector to generate the second modifiedreceived vector; selecting, for the flipped transmitted vector, a firstelement as a child symbol and the second element as a parent symbol; theflipped transmitted vector is derived by flipping the rows of thetransmitted vector; determining, for the flipped transmitted vector, foreach possible value of the parent symbol, a first child by mapping azero-forcing estimate of the child symbol to a nearest constellationpoint in an associated constellation scheme using the second modifiedreceived vector and the channel matrix; adding, for the flippedtransmitted vector, candidates to the candidate list of the flippedtransmitted vectors from the determined parent symbol and its firstchild symbol; and calculating log-likelihood ratios (LLRs) of all bitsfor the parent symbol of the flipped transmit vector.

BRIEF DESCRIPTION OF THE DRAWINGS

Further features and advantages of the present disclosure will becomeapparent from the following detailed description, taken in combinationwith the appended drawings, in which:

FIG. 1 is a block diagram of the MIMO transmitter and iterative receiverstructure;

FIG. 2 shows a method of a linear-complexity optimal soft MIMO detector;

FIG. 3 shows a method of a linear-complexity optimal soft MIMO detectorusing a QR-decomposition;

FIG. 4 shows an illustrative representation of the first childcalculation process for a specific case of 2×N_(R) MIMO system with4-QAM constellation;

FIG. 5 shows a method of a linear-complexity optimal soft MIMO detectorusing a simplified First-Child method;

FIG. 6 shows an illustrative representation of the efficient provisionof candidates for the LLR calculation without performing the exhaustivesearch; and

FIG. 7 shows an illustrative representation of the way the LLR valuesare calculated (the value of L(x_(1,1)|y) as an example) based on theFirst-Child method; and

It will be noted that throughout the appended drawings, like featuresare identified by like reference numerals.

DETAILED DESCRIPTION

Embodiments are described below, by way of example only, with referenceto FIGS. 1-7. A novel scalable pipelined architecture for MIMO softsymbol detection featuring an efficient implementation for a 2×N_(R)MIMO system is provided. The transmission scheme is based on the spatialmultiplexing scheme. The method provides a channel independent schemewith fixed-throughput independent of the SNR value. The architectureoperates at a significantly lower complexity than currently reportedschemes as its complexity grows linearly with the constellation order,which makes it applicable for a broad range of applications/standardswith various constraints on the constellation size. A means ofnon-exhaustive search is provided, which is applied twice, once perantenna, and both in parallel. It efficiently searches over a subset ofnodes, which provides all the LLR values of transmitted bits resultingin the optimal soft detection at the receiver.

The architecture is also customized for the specific application ofuplink collaborative MIMO in IEEE 802.16e standard. Since the nodeexpansion and LLR calculation cores cooperate on a data-driven basis,and the scheme is applied independently for each antenna, thearchitecture is well-suited for a pipelined parallel VLSI implementationwith a fixed critical path length independent of the constellationorder.

The detection technique described herein may be used for variouswireless MIMO communication systems including the MIMO-OFDM systems. Forclarity, the various embodiments are described for the MIMO detectioncore of a 2×N_(R) MIMO system.

System Model:

In a wireless MIMO system as shown in FIG. 1, with N_(T) transmit 128and N_(R) receive antennas 140, the equivalent fading channel can bedescribed by a complex-valued N_(R)×N_(T) matrix H. A typical bitinterleaved coded modulation (BICM) MIMO system is considered with aniterative APP receiver 130. The transmitter 120, receives a binarysource 150 where a block of information bits is encoded withconvolutional turbo code 122 and permuted by an interleaver 124. At aspecific time instant, the encoded bits of the sequence x=[x₁, . . . ,x_(M) _(c) _(N) _(T) ]^(T) as a part of the permuted stream are mappedinto a complex vector s=[s₁, . . . , s_(N) _(T) ]^(T), by N_(T) linearmodulators in which each element is independently drawn from a complexconstellation Ω (symmetric |Ω|-QAM schemes with M_(c)=log₂|Ω| bits persymbol, i.e., |Ω|=2^(M) ^(c) ). For instance in the case of 4-QAM,|Ω|=4, Ω={−1−j,−1+j,1−j,1+j}, and M_(c)=2, meaning there are two bitsper transmitted symbol. This implies that x₁,x₂ are mapped to s₁ and soon. The QAM modulated signals are passed through the linear modulator126, which essentially determines the way that the QAM modulated signalsare transmitted on N_(T) antennas 128 at the transmitter side. Thecomplex baseband equivalent model can be expressed asy=Hs+n  (3)

where y=[y₁,y₂, . . . , y_(N) _(R) ]^(T) is the N_(R)-dimensionalreceived symbol vector, and n=[n₁,n₂, . . . , n_(N) _(R) ]^(T)represents the N_(R)-dimensional independent identically distributed(i.i.d.) circularly symmetric complex zero-mean Gaussian noise vectorwith variance σ², i.e., n₁εN_(c)(0,σ²).

Hard Detection:

The aim of the MIMO hard detection method is to estimate the transmittedvector ŝ based on the observation y, i.e.,ŝ=argmin_(s) ∥y−Hs∥ ².  (4)

In other words, in a two-dimensional constellation scheme, the one withthe lowest Euclidean distance from the received point is announced to bethe transmitted symbol. In a N_(R)×N_(T) MIMO system, there are in|Ω|^(N) ^(T) possible transmit vectors to search through. For example ina 2×N_(R) MIMO system with 64-QAM constellation scheme there are64²=4096 possible vector symbols to explore. Therefore, even for a 2×2system with high-order constellation schemes, the required computationto find the optimal point is expensive from the implementation point ofview, whose amount of computation may exceed the processing power of thecurrent state-of-the-art DSP or FPGAs at the receiver.

Soft Detection:

Since the transmitted bits x, are the output of an ECC encoder 122 thatintroduces redundancy, the bit-by-bit decision is no longer optimal. Thea posteriori probability (APP) MIMO detector 132 should make decisionjointly on all blocks using the knowledge of the correlation acrossblocks, and the channel decoding is performed using soft information onall the blocks obtained from the APP MIMO detector. Therefore, aniterative receiver that performs joint detection and decoding isrequired.

An iterative receiver 130 consists of two stages: the soft MIMO APPdetector 132, followed by an outer soft ECC decoder 136 providing binaryoutput 152. The two stages are separated by a deinterleaver 134 and aninterleaver 138. FIG. 1 illustrates how the soft information is iteratedbetween the MIMO APP detector 132 and the outer soft ECC decoder 136.The outer soft ECC decoder 136 and deinterleaver 134 and interleaver 138can be identified as a Convolutional Turbo Code (CTC) decoder.Considering each transmit antenna 128 as a layer, the optimallog-likelihood ratio (LLR) of the bit x_(k), k=1, . . . , M_(c), of thel-th layer knowing the received vector y, is obtained by the APPdetector as follows.

$\begin{matrix}{{{L\left( x_{k,l} \middle| y \right)} = {\log\;\frac{P\left( {x_{k,l} = \left. {+ 1} \middle| y \right.} \right)}{P\left( {x_{k,l} = \left. {- 1} \middle| y \right.} \right)}}},} & (5)\end{matrix}$where −1 and +1 are used for representing bit “0”, and “1”,respectively. For the known channel H in an additive white Gaussiannoise (AWGN) environment, using the Bayes' theorem, the above LLR,L(x_(k,l)|y), can be written as:

$\begin{matrix}\begin{matrix}{\log\frac{{P\left( {\left. y \middle| x_{k,l} \right. = {+ 1}} \right)}{P\left( {x_{k,l} = {+ 1}} \right)}}{{P\left( {\left. y \middle| x_{k,l} \right. = {- 1}} \right)}{P\left( {x_{k,l} = {- 1}} \right)}}} \\{= {{\log\frac{\sum\limits_{\chi_{k,l}^{+ 1}}^{\;}{\exp\left( \frac{- {{y - {Hs}}}^{2}}{2\sigma^{2}} \right)}}{\sum\limits_{\chi_{k,l}^{- 1}}^{\;}{\exp\left( \frac{- {{y - {Hs}}}^{2}}{2\sigma^{2}} \right)}}} + {\log\frac{P\left( {x_{k,l} = {+ 1}} \right)}{P\left( {x_{k,l} = {- 1}} \right)}}}}\end{matrix} & \begin{matrix}\begin{matrix}\begin{matrix}\begin{matrix}\begin{matrix}(6) \\\;\end{matrix} \\(7)\end{matrix} \\\;\end{matrix} \\\;\end{matrix} \\\;\end{matrix}\end{matrix}$

where χ_(k,l) ⁺¹={s:x_(k,l)=+1} and χ_(k,l) ⁻¹={s:x_(k,l)=−1} representall the possible vectors that have +1 and −1 in their k-th bit of thel-th layer, respectively. Moreover, the extra term in (7) can be ignoredas two cases of being +1 and −1 are equally likely. This formula iscomputationally complex and needs to be simplified for the practicalimplementation. Employing the max-log approximation, the LLR values(i.e., L(x_(k,l)|y)) can be approximated by the maximum term on thenumerator and denominator as follows.

$\begin{matrix}\begin{matrix}{\cong {\log\frac{\max_{\chi_{k,l}^{+ 1}}{\exp\left( \frac{- {{y - {Hs}}}^{2}}{2\sigma^{2}} \right)}}{\max_{\chi_{k,l}^{- 1}}{\exp\left( \frac{- {{y - {Hs}}}^{2}}{2\sigma^{2}} \right)}}}} \\{= {\frac{1}{2\sigma^{2}}\left\lbrack {{\min_{\chi_{k,l}^{- 1}}{{y - {Hs}}}^{2}} - {\min_{\chi_{k,l}^{+ 1}}{{y - {Hs}}}^{2}}} \right\rbrack}}\end{matrix} & (8)\end{matrix}$

The optimal detector in an iterative MIMO receiver is well-known to bethe MAP detector, sometimes also called a posteriori probability (APP)detector. This detector computes extrinsic information on the channelbits, written in terms of the log-likelihood ratios (LLRs). LLR valuescan be calculated in many different methods, which depends on thecomplexity/performance feature of that method. The optimal solution forthe implementation of the soft ML decoder requires exploring all thepossible symbol vectors and calculates the LLR value of each individualbit accordingly. Even by employing the simplification in (8), thisincurs the computational complexity of |Ω|^(N) ^(T) visited points,which is prohibitive for most of the applications. Soft sphere decoderis one alternative, whose idea is to reduce the number of visited symbolvectors to be considered in the search that solves the optimizationproblem in (4), without accidentally excluding the ML solution. Thisgoal is achieved by constraining the search to only those points of Hsthat lie inside a hyper-sphere with radius r around the received pointy. Since the soft detection is of concern, lists of candidates arerequired at the end to calculate the LLR values. Thus in the list spheredecoder as opposed to the hard-SD, the radius of the sphere is notdecreased as the depth of the tree is expanded. This would result in alist of visited points rather than just the ML point, i.e., so the namelist sphere decoder (LSD). Note that the LSD would produce a list ofpoints, including the hard ML point, which is the result of the hard SD.Although list SD provides a subset of visited nodes much less than thatof the ML detector, its performance is not guaranteed to be ML, thus thelist has to be large enough to assure the optimality. Moreover, listsphere decoder only reduces the average complexity compared to the MLdetector although it has been shown that its computational complexity isexponential in the number of transmit antennas. The fact that itscomputational complexity is channel dependent makes it less attractivefor practical implementations.

MIMO Detection:

The channel, H is assumed to be known at the receiver 130 (e.g., throughchannel estimation in a preceding training phase). There arelinear/non-linear receiver algorithms to separate and detect thesimultaneously transmitted data streams such as MMSE, Maximum-likelihooddetector, sphere decoder, fixed-sphere decoder, iterative tree search,and distributed ML. Each of these approaches has its own drawbacksincluding the hardware complexity, channel/SNR dependency, and thenon-optimal performance. The described architecture in FIG. 2 alleviatesboth of the above problems.

Detection Method:

Presented in a general case, two transmitted symbols are assumed to havedifferent modulation levels. This makes the algorithm applicable to bothdownlink (subscriber station) and uplink collaborative MIMO (in IEEE802.16e standard) where two single-antenna users transmitting at thesame time slot and carrier are modeled as a 2×2 MIMO system. Obviously,two users can have different constellations independent of one another.

FIG. 2. shows the method of the optimal soft MIMO detector with thelinear complexity where

$y = \begin{bmatrix}y_{1} \\y_{2}\end{bmatrix}$is considered as the received vector, and

$\quad\begin{bmatrix}s_{1} \\s_{2}\end{bmatrix}$and its flipped version,

$\quad{\begin{bmatrix}s_{2} \\s_{1}\end{bmatrix},}$are considered as the first and second transmitted vectors. The flippedversion of the transmitted vector convey no extra information and isintroduced to ease the description of the process. For

$\quad{\begin{bmatrix}s_{1} \\s_{2}\end{bmatrix},{H = \begin{bmatrix}h_{11} & h_{12} \\h_{21} & h_{22}\end{bmatrix}}}$is considered as its corresponding channel matrix whereas for

$\quad{\begin{bmatrix}s_{2} \\s_{1}\end{bmatrix},}$the channel matrix is

$\overset{\sim}{H} = {\begin{bmatrix}h_{12} & h_{11} \\h_{22} & h_{21}\end{bmatrix}.}$The algorithm, i.e., steps 200 through 214, are implemented twice, oncefor the pair

$\left( \left. \quad{\begin{bmatrix}s_{1} \\s_{2}\end{bmatrix},{H = \begin{bmatrix}h_{11} & h_{12} \\h_{21} & h_{22}\end{bmatrix}}} \right) \right.$and once for the pair

$\left( {\left. \quad{\begin{bmatrix}s_{2} \\s_{1}\end{bmatrix},{\overset{\sim}{H} = \begin{bmatrix}h_{12} & h_{11} \\h_{22} & h_{21}\end{bmatrix}}} \right).} \right.$For each pair, the channel pre-processing is performed to calculate thegenerator matrix at step 200, which is either the QR-decomposition orD-matrix generation block. The output of the pre-processing block is amatrix, called the generator matrix, that is applied to the receivedsignal vector (equations (9), (13), or (18), and (20)). The generatormatrix is Q^(H) in the QR-decomposition method or matrix D in thesimplified First-Child scheme. The generated matrix is applied to thereceived vector at step 202. The first element of each transmittedvector is taken as the child symbol while its second element is taken asthe parent symbol at step 204. For each transmitted vector, all possiblevalues for the parent symbol is considered and at step 206 for eachconsidered parent symbol, its best (first) child is determined using thezero-forcing estimate based on the modified received vector. Theresulting parent symbol and its first child are added to the candidatelist at step 208. If all of the possible values for the parent symbolhave not been considered, No at step 210, the process repeats at step206 until all the candidates from the respective constellation schemefor the parent symbol are considered. When all possible values for theparent symbol have been considered, Yes at step 210, the log-likelihoodratios (LLRs) of all bits are calculated for each resulting vector atstep 212. The resulting LLR values are provided to the CTC decoder atstep 214.

An implementation of the method utilizing QR-decompositions is shown inreference to FIG. 3. Let Ω₁ and Ω₂ denote the constellation schemes ofs₁ and s₂, respectively. The QR-decomposition of the channel matrix isdenoted as H=QR shown at step 302, where Q is a unitary matrix of size2×2 and R is an upper triangular 2×2 matrix (FIG. 4 shows an example ina 2×2 4-QAM system). At step 304 (or equivalently step 404 in FIG. 4),performing the following nulling operation by Q^(H) yields:

$\begin{matrix}{z = {{Q^{H}y} = {{{Rs} + v} = {\begin{bmatrix}r_{11} & r_{12} \\0 & r_{22}\end{bmatrix}{\quad{{\begin{bmatrix}s_{1} \\s_{2}\end{bmatrix} + v},}}}}}} & (9)\end{matrix}$

where r₁₁, r₂₂ are real numbers. Since the nulling matrix Q^(H) isunitary, the noise, v=Q^(H)n, remains spatially white. Exploiting thetriangular nature of R in (9), the vector norm, ∥y−Hs∥², can be expandedas.ŝ=arg min_(s) {|z ₂ −r ₂₂ s ₂|² +|z ₁ −r ₁₁ s ₁ −r ₁₂ s ₂|²}.  (10)

The above problem can be thought of as a tree-based search problem with2 levels (404 in FIG. 4), where the first level of the tree correspondsto the second row of the matrix in (9) while the second level of thetree is corresponding to the first row of (9). Starting from the lastrow, s₂ (called the parent symbol) is detected first and based on s₂(called the child symbol) the next symbol, s₁, in the upper row will bedetected. Thus in order to find the optimal solution, all the possiblevalues of s₂ in Ω₂ are considered at step 310, resulting in |Ω₂|possible candidates denoted by {ŝ₂ ¹,ŝ₂ ², . . . , ŝ₂ ^(|Ω) ² ^(|)}(step 406 in FIG. 4.). Each of these candidates has |Ω₁| possiblechildren. However, only the best child of each candidate is selected atstep 312 (step 408 in FIG. 4.). The best child refers to the child (s₁)that results in the lowest Euclidean distance from the received point.Thus based on the model in (9), for a specific parent candidate ŝ₂′, itsfirst child is determined using the following minimization:ŝ ₁′=arg min_(s) ₁ _(ε•) ₁ |z ₁ −r ₁₁ s ₁ −r ₁₂ ŝ ₂′|²,  (11)

for all iε{1, . . . , Ω₂}. A simple zero-forcing estimation (i.e.,(z₁−r₁₂s₂)×1/r₁₁) can be employed in step 312 to prohibit the exhaustivesearch to find the best child.

Let s_(ML) represent the set of all such pair candidates, i.e.,s_(ML)={[ŝ₁′,ŝ₂′]^(T)}_(i=1) ^(|Ω) ² ^(|) that are added to thecandidate list at step 314. At step 316 it is determined if all valuesof i have been evaluated, if No at step 316, step 312 is performed. Onceall the values have been evaluated, Yes at step 316, the log-likelihoodratios (LLRs) of all bits are calculated for each resulting vector atstep 318. The resulting LLR values are provided to the CTC decoder atstep 319. These LLR values are statistically sufficient to find theoptimal values of the corresponding transmitted bits of s₂.

The next step is to flip the columns of H (called {tilde over (H)}) atstep 320. This results in the following model:

$\begin{matrix}{{y = {{{\overset{\sim}{H}\overset{\sim}{s}} + n} = {{\begin{bmatrix}h_{12} & h_{11} \\h_{22} & h_{21}\end{bmatrix}\begin{bmatrix}s_{2} \\s_{1}\end{bmatrix}} + n}}},} & (12)\end{matrix}$

where {tilde over (s)}=[s₂ s₁]^(T). In this case s₁ is the parent symboland s₂ is the child symbol resulting in a new tree again with two levels(412 in FIG. 4.). Taking the same approach as before, by applying theQR-decomposition on {tilde over (H)}, the nulling operation results inthe following at step 322 (or equivalently step 410 in FIG. 4.).

$\begin{matrix}{\overset{\sim}{z} = {{{\overset{\sim}{Q}}^{H}y} = {{\overset{\sim}{R}\overset{\sim}{s}} + \overset{\sim}{v} + {\begin{bmatrix}{\overset{\sim}{r}}_{11} & {\overset{\sim}{r}}_{12} \\0 & {\overset{\sim}{r}}_{22}\end{bmatrix}{\quad{{\begin{bmatrix}s_{2} \\s_{1}\end{bmatrix} + \overset{\sim}{v}},}}}}}} & (13)\end{matrix}$

In (13), all the possible cases of s₁ in Ω₁ are considered at step 324(step 414 in FIG. 4.) and for each of them its corresponding first childis determined (416 of FIG. 4). The new resulting pairs{[ŝ₁′,ŝ₂′]^(T)}_(i=1) ^(|Ω) ¹ ^(|), are added to s_(ML) at step 334. Atstep 340 it is determined if all values of i have been evaluated, if Noat step 340, step 332 is performed. Therefore, the outputs of steps 316and 340 are a total of |Ω₁|+|Ω₂| pairs in the set s_(ML), based on whichthe LLR values are calculated at step 342 and the soft ML detection isperformed. The resulting LLR values are provided to the CTC decoder atstep 319 for further processing.

It will be shown below that the above approach results in the optimal MLdetection so there is no performance loss associated to the First-Childdetector compared to the ML-detector (Theorem below). Note that whilethe First-Child has the same performance result as that of in ML, itcomes with a much lower complexity, which scales linearly with theconstellation size. This is because of the fact that in the First-Childscheme, the total number of visited branches is |Ω₁|+|Ω₂| as opposed to|Ω₁|×|Ω₂| branches in the ML detection scheme. The gap in the complexitywidens especially for high-order constellation schemes (for example inthe case of 256-QAM, the First-Child approach searches over 0.7% of thepoints that the ML detector explores because 2×256/256²≅0.0071, thus asignificant reduction in the complexity).

In the sequel, the method for finding the first child of each candidatewithout visiting all the possible children is described. In (9), ŝ₂′ isassumed to be the current parent candidate. Using the first row of (9),the zero-forcing estimate of s₁ can be found asz ₁ ^(ZF)=(z ₁ −r ₁₂ ŝ ₂′)/r ₁₁.  (14)

In order to find the first child of ŝ₂′, z₁ ^(ZF) needs to be rounded tothe nearest constellation point. This is definitely the child thatminimizes the Euclidean distance between the candidate pair and thereceived signal. This is because of the fact that the Euclidean distancecan be written |z₂−r₂₂s₂|²+|(z₁−r₁₂s₂)/r₁₁−s₁|r₁₁ ², where the firstterm of this summation is independent of s₁ and the second term isminimized by choosing the closest point in the constellation(z₁−r₁₂s₂)/r₁₁.

The proof of the optimality of the above scheme is presented in theTheorem below. The above approach can be easily extended to a general2×N_(R) system. In other words, the QR-decomposition of a N_(R)×2channel matrix H=QR results in a N_(R)×N_(R) unitary matrix Q and aN_(R)×2 upper triangular matrix R whose last N_(R)−2 rows are all zero.Thus after the QR-decomposition, both sides of the equation (3) aremultiplied by Q^(H), and taking the first two rows of the resultingequation, the detection process is followed in the exact same way asthat of in a 2×2 system.

Simplified First-Child:

The above method for the soft-detection is based on the implementationof the QR-decomposition. Since each QR-decomposition requires twodivisions to be implemented, its hardware realization might incur acomplex hardware core. Moreover, the fixed-point implementation of theQR-decomposition depending on the underlying method reveals instabilityspecially for ill-conditioned channels, which is because of theprojections intrinsic to the QR-decomposition methods. An alternative tothe QR decomposition, Simplified First-Child avoids the incurredcomputational complexity of the QR-decomposition while providing aframework to implement the First-Child.

The motivation is to develop a simplified version of the above scheme,which does not require the implementation of the QR-decomposition, whichresults in lower hardware complexity while maintaining the benefits ofthe above scheme in terms of the optimality of the soft detectionmethod.

The complex baseband equivalent model, considered for a 2×2 MIMO systemin (3), can be rewritten as:

$\begin{matrix}{y = {{{Hs} + n} = {{\begin{bmatrix}h_{11} & h_{12} \\h_{21} & h_{22}\end{bmatrix}s} + {n.}}}} & (15)\end{matrix}$

Let

${a = \frac{h_{11}}{H_{1}}},{and}$ ${b = \frac{h_{21}}{H_{1}}},$where ∥H₁∥=[h₁₁ h₂₁]^(T), and ∥H₁∥=|h₁₁|²+|h₂₁|², denoting the norm ofthe first column of H. Based on these definitions, a matrix D is definedas follows:

$\begin{matrix}{D = {\begin{bmatrix}a^{*} & b^{*} \\{- b} & a\end{bmatrix}.}} & (16)\end{matrix}$

In fact, matrix D is used instead of the QR-decomposition in order totriangularize the channel matrix. In other words, the application of Dto H, removes the interference of one signal from the other, i.e.,

$\begin{matrix}{{{D*H} = {{\begin{bmatrix}a^{*} & b^{*} \\{- b} & a\end{bmatrix} \times \begin{bmatrix}h_{11} & h_{12} \\h_{21} & h_{22}\end{bmatrix}} = \begin{bmatrix}1 & h_{12}^{\prime} \\0 & h_{22}^{\prime}\end{bmatrix}}},} & (17)\end{matrix}$

where h₁₂′=[a* b*]×H₂, and h₂₂′=[−b a]×H₂. Since the matrix D isunitary, because of the fact that

$\begin{matrix}{{D^{H} \times D} = {D \times D^{H}}} \\{= \begin{bmatrix}\left| a \middle| {}_{2}{+ |b|^{2}} \right. & 0 \\0 & \left| a \middle| {}_{2}{+ |b|^{2}} \right.\end{bmatrix}} \\{{= {\left( \left| a \middle| {}_{2}{+ |b|^{2}} \right. \right)I}},}\end{matrix}$its application to the received signal does not incur the noiseenhancement problem and preserves the optimality. This means that noisewill remain spatially white. For a known channel matrix, the applicationof D to (15) can be written as:

$\begin{matrix}{z = {{D*y} = {{D*H} + {D*n}}}} & (18) \\{z = {{\begin{bmatrix}1 & h_{12}^{\prime} \\0 & h_{22}^{\prime}\end{bmatrix}\begin{bmatrix}s_{1} \\s_{2}\end{bmatrix}} + v}} & (19)\end{matrix}$

Comparing (19) with (9), in the last row, the interference of s₁ iscancelled from s₂, which makes it possible to detect s₂ independently.In fact, because of the soft nature of the detection scheme, this allowsall the possibilities to be considered for s₂ based on which the bestcandidates for s₁ are selected. The result is a complementary list ofcandidates (i.e., s_(FC)) that provides the sufficient information tocalculate the LLR values of s₂. Note that as opposed to the real valuer₂₂ in (9), the parameter h₂₂′ in (19) is a complex number.

Since matrix D is unitary, noise whiteness and variance remain intact.This is as opposed to the common nulling operations in ZF, or MMSE,where the main goal is the interference cancellation. In other words, inthese approaches the interference cancellation comes with the noiseenhancement whereas in the present disclosure the interferencecancellation is performed while the noise remains intact. This providesa suitable framework for the optimal soft detection. In other words,after applying the matrix D to the received matrix, s₂ can be detected.Since the optimal soft detection is concerned, all the possible valuesof s₂ are considered. For each of the s₂ values, the best candidate fors₁, which refers to the candidate resulting in the lowest Euclideandistance from the received signal, is determined based on the first rowof (19). Using this strategy, the exhaustive search is avoided while theoptimal set for LLR values for s₂ are calculated. The flowchart of thesimplified First-Child scheme is shown in FIG. 5.

In order to calculate the LLR values of s₁, the same process isperformed for the flipped version of the channel, i.e., {tilde over (H)}in (12). Based on the equation (12), a new matrix {tilde over (D)} isdefined and applied to both sides of (12), resulting in:

$\begin{matrix}{\overset{\sim}{z} = {{\overset{\sim}{D}*y} = {{\overset{\sim}{D}*\overset{\sim}{H}} + {\overset{\sim}{D}*n}}}} & (20) \\{{\overset{\sim}{z} = {{\begin{bmatrix}1 & h_{11}^{\prime} \\0 & h_{21}^{\prime}\end{bmatrix}\begin{bmatrix}s_{2} \\s_{1}\end{bmatrix}} + \overset{\sim}{v}}}{where}} & (21) \\{{\overset{\sim}{D} = \begin{bmatrix}c^{*} & d^{*} \\{- d} & c\end{bmatrix}}{{c = \frac{h_{21}}{H_{2}}},{and}}{{d = \frac{h_{22}}{H_{2}}},{where}}{{{H_{2}} = \left\lbrack {h_{12}\mspace{14mu} h_{22}} \right\rbrack^{T}},{and}}{{H_{2}} = \left| h_{12} \middle| {}_{2}{+ \left| h_{22} \middle| {}_{2}. \right.} \right.}} & (22)\end{matrix}$

Thus the LLR values of the second transmitted symbol (user), s₂, arecalculated based on (19), while the LLR values associated to the firstsymbol (user), s₁, are determined using equations (20)-(22). Thederivation of (19), and (21) requires only two matrix multiplicationsand does not require the implementation of the projection function andsquare-root function.

Due to the nature of the matrix D, the first elements of D×H and {tildeover (D)}×{tilde over (H)} are always unity, which implies that thedivision in the normal implementation based on QR in (14) is alwaysavoided in this scheme. This results in fewer number of divisions and amore stable fixed-point implementation.

Referring to FIG. 5, input parameters y and H are provided. Thefollowing parameters are initialized:

$\begin{matrix}{{{Set}\mspace{14mu} s_{FC}} = {{{\{\}}\mspace{14mu}{and}\mspace{14mu}{\overset{\sim}{s}}_{FC}} = {\{\}}}} & \left. 1 \right) \\{{a = \frac{h_{11}}{H_{1}}},{b = \frac{h_{21}}{H_{1}}},{c = \frac{h_{12}}{H_{2}}},{d = {\frac{h_{22}}{H_{2}}.}}} & \left. 2 \right) \\{{h_{12}^{\prime} = {{a*h_{12}} + {b*h_{22}}}},{h_{22}^{\prime} = {{- {bh}_{12}} + {ah}_{22}}},{h_{11}^{\prime} = {{c*h_{11}} + {d*h_{21}}}},{h_{21}^{\prime} = {{- {dh}_{11}} + {{ch}_{21}.}}}} & \left. 3 \right)\end{matrix}$

For the processing of the first received vector s₂,

$\left. D\leftarrow\begin{bmatrix}a^{*} & b^{*} \\{- b} & a\end{bmatrix} \right.$is computed at step 502. z=[z₁ z₂]^(T)=D×y is then calculated at step504. For i=1:|Ω₂| the first child of each s²′ is determined by mapping(z₁−h₁₂′s₂′) to its nearest ŝ₁′ in the constellation Ω₁ at step 510 andthe resulting set |ŝ₁′,ŝ₂′) is added to s_(FC) candidate list at step512. Steps 510 and 512 are repeated if all values of s₁ have not beenconsidered, No at step 514. If all values have been considered, Yes atstep 514, the LLR values of s₂ based on s_(FC) are calculated.

The second received vector s₁ is processed,

$\left. \overset{\sim}{D}\leftarrow\begin{bmatrix}c^{*} & d^{*} \\{- d} & c\end{bmatrix} \right.$is computed at step 520. {tilde over (z)}=[{tilde over (z)}₁ {tilde over(z)}₂]^(T)=D×y is calculated at step 522. For i=1:|Ω₁| the first childof each s₁′ is determined by mapping ({tilde over (z)}₁−h₁₁′s₁′) to itsnearest ŝ₂′ in constellation Ω₂ at step 528 and the resulting set{ŝ₁′,ŝ₂′} is added to {tilde over (s)}_(FC) candidate list at step 530.Step 528 and 530 are repeated if values of s₂ have not been considered,No at step 532. If all values have been considered, Yes at step 532, theLLR values of s₁ based on {tilde over (s)}_(FC) are calculated. The LLRsfor s₁ and s₂ are provided to the CTC decoder at step 540 forprocessing.

The simplified First-Child method provides an efficient way to calculatethe LLR values. FIG. 6 shows an example for a 2×2 4-QAM MIMO system,where the 4×4 matrix represents all the possible combinations of thetransmitted symbols for s₁ (columns) and s₂ (rows). The e_(ij) valuesrepresent the Euclidean distance between y and H[s₁ s₂]^(T). The figureshows how the LLR value of the first bit of s₂ is calculated based onthe candidate lists s_(FC) generated by First-Child scheme, where Box 1602 represents all the possible cases where the first bit of s₂ is “0”,whereas Box 2 604 represents all the possible cases where the first bitof s₂ is “1”. To calculate the LLR, it is required to calculate theminimum value of e_(ij) values in each box first and then subtract themto find the LLR. This requires the calculation of all e_(ij) values. Theminimum values of e_(ij) are directly determined in each row withoutcalculating the other values. In fact each row corresponds to one parentcandidate. Thus this local minimization is done using the First-Childmethod for each parent symbol. Once the local minimums are found in eachrow (gray circles 610, 612, 614 and 616 in (b), the minimization betweenthe first 620 and second 622 rows gives the minimum value of Box 1 andthe minimum of the third 624 row and fourth 626 row gives the minimumvalue of Box 2. These min values 610, 612, 614 and 616 in the rows arethen used to calculate the LLR value of the first and second bit of s₂.This process is pictorially shown in FIG. 7 too, where the LLR of thefirst bit of s₁ (x_(1,1)) is calculated. In FIG. 7 the first twobranches of 616 in the tree correspond to the box 602 while the twobranches of 716 on the right correspond to the box 604 in FIG. 6. Thefirst level of minimization in 618 corresponds to the first childcalculation and local minimizations in each row while the second levelof minimizations in 720 represent the minimum Partial Euclidean Distance(PED) in each box. The same process is performed column-wise tocalculate the LLR values of s₁, by considering vertical boxes in thiscase. The amount of the saving in the computation grows exponentiallywith increasing constellation order. Therefore, the number of searchesare reduced from |Ω|² to 2|Ω|, assuming Ω₁=Ω₂=Ω.

The above method can be applied to any 2×N_(R) system. The general ideais to generate a generalized D matrix, which performs the nullingoperation while avoids the noise enhancement and at the same time makesall the elements of the first column of H, except its first element,zero. For instance for N_(R)=4, the matrix D can be written as follows:

${D = {\begin{bmatrix}r_{1} & 0 & r_{2} & 0 \\0 & r_{1} & 0 & r_{2} \\{- r_{2}} & 0 & r_{1} & 0 \\0 & {- r_{2}} & 0 & r_{1}\end{bmatrix}\begin{bmatrix}a^{*} & b^{*} & 0 & 0 \\{- b} & a & 0 & 0 \\0 & 0 & c^{*} & d^{*} \\0 & 0 & {- d} & c\end{bmatrix}}},{where}$${a = \frac{h_{11}}{r_{1}}},{b = \frac{h_{21}}{r_{1}}},{c = \frac{h_{31}}{r_{2}}},{d = \frac{h_{41}}{r_{2}}},{r_{1} = \sqrt{\left| h_{11} \middle| {}_{2}{+ \left| h_{21} \right|^{2}} \right.}},{r_{2} = {\sqrt{\left| h_{31} \middle| {}_{2}{+ \left| h_{41} \right|^{2}} \right.}.}}$Note that the above defined D matrix is unitary and makes the firstelement of the first column of H unity, while the rest of the elementsof the first column become zero. In other words:

$H^{\prime} = {{D*H} = {\begin{bmatrix}1 & h_{12}^{\prime} \\0 & h_{22}^{\prime} \\0 & h_{32}^{\prime} \\0 & h_{42}^{\prime}\end{bmatrix}.}}$

Based on this mathematical formulation, again all the possible values ofs₂ are considered, where for each of them the first child is calculatedbased on the first row of H′. This calculation is also repeated for theflipped version of H (i.e., {tilde over (H)}) and the resulting firstchildren are sent to the LLR calculation core to calculate the LLRvalues.

Theorem:

The First-Child method provides the exact ML solution for a 2×N_(R) MIMOsystem.

Proof:

In a 2×N_(R) MIMO, where two symbols with constellations Ω₁, and Ω₂ aretransmitted at the same time, for instance if s₁ transmits with 4-QAM,|Ω₁|=4, Ω₁={−1−j,−1+j,1−j,1+j}, let M_(c1)=log₂(|Ω₁|), andM_(c2)=log₂(|Ω₂|). Using the definition of the log-likelihood-ratio, theLLR value of the k-th bit of the l-th symbol is derived based on (8).Since in the WiMAX framework, l=2, for the optimum ML soft-demodulation,the LLR computation needs to be implemented by visiting all theconstellation points in the two dimensional received signal space.Therefore, in order to compute the LLR values in (9) for a specific bit,the whole space needs to be explored. For instance the LLR value of thek-th bit of the second symbol, Λ₂ ^((k)), can be written as:

$\begin{matrix}{{L\left( x_{k,2} \middle| y \right)} = {{\frac{1}{2\sigma^{2}}{\min_{\chi_{k,2}^{- 1}}{{y - {H \cdot \begin{bmatrix}s_{1} \\s_{2}\end{bmatrix}}}}^{2}}} - {\frac{1}{2\sigma^{2}}{\min_{\chi_{k,2}^{+ 1}}{{y - {H \cdot \begin{bmatrix}s_{1} \\s_{2}\end{bmatrix}}}}^{2}}}}} & (23)\end{matrix}$

This means that we need to calculate the metric

${{y - {H \cdot \begin{bmatrix}s_{1} \\s_{2}\end{bmatrix}}}}^{2}$for all cases in which the first bit of s₂ maps to “0”, and also for thecase where the first bit of s₂ maps to “1”. Each of these cases includes2^(M) ^(c2) ⁻¹×2^(M) ^(c1) points. Thus the calculation of (23) requiresto explore 2^(M) ^(c1) ⁻¹×2^(M) ^(c2) points to be searched for the caseof x_(k,2)=+1 and the same number of points for the case of x_(k,2)=−1,which comes to the total number of 2^(M) ^(c1) ^(+M) ^(c2) constellationpoints. Remember we need to repeat the same computation load for otherbits of the two layers. Since there are M_(c1) bits per first symbol andM_(c2) bits per second symbol, M_(c1)+M_(c2) bits in total, this resultsin the total computation of (M_(c1)+M_(c2))×2^(M) ^(c1) ^(+M) ^(c2)constellation points. This means that the complexity of ML isexponential with the constellation size.

We prove that the proposed approach calculates the same LLR values asthat of for ML for the second symbol s₂. The derivation for s₁ will bethe same. Thus we focus on the LLR calculation for s₂ by ML in (23). Let

${s = \begin{bmatrix}s_{1} \\s_{2}\end{bmatrix}},$since the matrix Q is unitary (i.e., Q^(H)Q=I), equation (23) can berewritten as

$\begin{matrix}{{L\left( x_{k,2} \middle| y \right)} = {{\frac{1}{2\sigma^{2}}{\min_{\chi_{k,2}^{- 1}}{{z - {Rs}}}^{2}}} - {\frac{1}{2\sigma^{2}}{\min_{\chi_{k,2}^{+ 1}}{{{z - {Rs}}}^{2}.}}}}} & (24)\end{matrix}$

Using the upper triangular structure of R in (9), L(x_(k,2)|y) in theabove expression can be further expanded to

$\begin{matrix}{\frac{1}{2\sigma^{2}}\begin{bmatrix}{\underset{l^{- 1}}{\underset{︸}{\min_{\chi_{k,2}^{- 1}}\left( \left| {z_{2} - {r_{22}s_{2}}} \middle| {}_{2}{+ \left| {z_{1} - {r_{11}s_{1}} - {r_{12}s_{2}}} \right|^{2}} \right. \right)}} -} \\\underset{l^{+ 1}}{\underset{︸}{\min_{\chi_{k,2}^{+ 1}}\left( \left| {z_{2} - {r_{22}s_{2}}} \middle| {}_{2}{+ \left| {z_{1} - {r_{11}s_{1}} - {r_{12}s_{2}}} \right|^{2}} \right. \right)}}\end{bmatrix}} & (25)\end{matrix}$

Since χ_(k,2) ⁻¹∩χ_(k,2) ⁺¹=φ, the above two minimizations are performedindependently, the minimization of the first term is focused on. Thesecond term can be minimized accordingly. Let K_(k) ⁻¹(K_(k) ⁺¹) denotethe set of all constellation points in Ω₂ that have the k-th bit as−1(+1), e.g. in the example in FIG. 4, K₁ ⁺¹={1+j,1−j} and K₂⁺¹={−1−j,1−j}. Note that |K_(k) ⁻¹=|K_(k) ⁺¹|=|Ω₂|/2, K_(k) ⁻¹∩K_(k)⁺¹=φ, and K_(k) ⁻¹∪K_(k) ⁺¹=Ω₂. Therefore, in order to minimize overχ_(k,2) ⁻¹, |Ω₂|/2 symbols are considered. For each of these symbols,there are in, |Ω₁| candidates associated with s₁. The ML approachexplores all the possible |Ω₂∥Ω₁|/2 candidates and find the one with thelowest PED. In our proposed approach, however, for each s₂ in K_(k) ⁻¹,the value of s₁ that results in the lowest local PED is determinedfirst. The globally lowest PED is in fact the one with the lowest PEDamong these local minimums. Thus our approach is equivalent tocalculating the global minimum by finding the one with the lowest PEDamong the local minimums. This is pictorially shown in FIG. 7 forcalculation of the LLR value of the first bit of the first level, i.e.,L(x_(1,1)|y). Therefore, the present approach is to choose one value fors₂ in K_(k) ⁻¹, and find the local minimum associate to the chosen s₂,i.e.,arg min_(s) ₁ _(εΩ) ₁ (|z₂−r₂₂s₂|²+|z₁−r₁₁s₁−r₁₂s₂|²)  (26)=arg min_(s) ₁ _(εΩ) ₁ (|z ₁ −r ₁₁ s ₁ −r ₁₂ s ₂|²)∀s ₂ εK _(k)⁻¹,  (27)

where (27) is based on the fact that the first term in (26) is in commonbetween all s₁εΩ₁. As mentioned earlier, the minimization in (27) isequivalent to considering |Ω₁| candidates for s₁ and find the one withthe lowest PED. In fact using the Schnorr-Euchner method, the candidatewith the lowest PED can be easily found without exploring all thepossible candidates. This is performed by mapping s₁ to the nearestcandidate based on the first order estimation as follows.

Since r₁₁ is a real number, the problem in (27) can be rewritten asarg min_(s) ₁ _(εΩ) ₁ [|

(z₁/r₁₁)−

(r₁₂/r₁₁s₂)−

(s₁)|²+|ℑ(z₁/r₁₁)−ℑ(r₁₂/r₁₁s₂)−ℑ(s₁)|²]  (28)

where

(•), and ℑ(•) denote the real part and the imaginary part of a complexnumber, respectively. The above minimization problem can be easilysolved by mapping the

(s₁), and ℑ(s₁) to the nearest constellation point based on

(z₁/r₁₁)−

(r₁₂/r₁₁s₂), and ℑ(z₁/r₁₁)−ℑ(r₁₂/r₁₁s₂), respectively. Thus for eachs₂εK_(k) ⁻¹, this method determines the value of s₁ that minimizes (27)with one single search rather than |Ω₁| searches, which is translated toa significant reduction both in the algorithmic and hardware levelcomplexity. In brief, in order to find I⁻¹ in (25), |Ω₂|/2 minimizationscorresponding to the elements in K_(k) ⁻¹ are performed. In the sameway, |Ω₂|/2 minimizations corresponding to the elements in K_(k) ⁺¹ areperformed to find I⁺¹. Thus with |Ω₂| searches, all nodes required tocalculate the L(x_(k,2)|y) value are determined. Note that thesecalculated nodes are sufficient to calculate the LLR values of all thebits of s₂. This is because of the fact that for any arbitrary valuekε{1, 2, . . . , M_(c2)}, |K_(k) ⁻¹|=|K_(k) ⁺¹|=|Ω₂|/2, K_(k) ⁻¹∩K_(k)⁺¹=φ, and K_(k) ⁻¹∪K_(k) ⁺¹=Ω₂. Therefore, using our proposed approachall the LLR values of s₂ are calculated, which results in the exact samevalues from the ML method. Using the same reasoning, it is easy to showthat the LLR values of s₁ are also the same as the ones from the MLdetector. Thus, all the LLR values of two symbols are determined withonly |Ω₁|+|Ω₂| searches rather than |Ω₁|×|Ω₂| in the ML detector.

The MIMO receiver algorithm provides the exact optimal ML solution andavoids the exhaustive search. The computational complexity growslinearly with the constellation order and therefore, it is easilyscalable to high-order constellation schemes such as 64-QAM and 256-QAM.The log-likelihood ratios are calculated efficiently by providing theminimum possible visited nodes theoretically required for the LLRcalculation. It has fixed-throughput independent of the SNR and thechannel condition. Since two transmitted symbols (users) are detectedindependently, and the LLR values of all bits of a symbol (user) arecalculated in parallel, the present disclosure is suitable for pipelinedand parallel hardware VLSI implementations. It implements the detectionof two users in collaborative-MIMO scheme (in WiMAX profile)independently. The intrinsic parallelism results in a low latencyhardware architecture and has fixed critical path length independent ofthe constellation order. The method is applicable to any 2×N_(R)Matrix-B MIMO architecture including downlink Matrix-B detection in IEEE802.16e and the collaborative MIMO (C-MIMO) framework envisioned in theuplink IEEE 802.16e. It can also be implemented jointly withbeam-forming techniques. It exploits the full diversity intrinsic to theC-MIMO scheme and it can easily accommodate two users with differentconstellation schemes (e.g. 4-QAM and 64-QAM). The detection complexityof each user is independent of the constellation order of the otheruser. The method is square-root free, which simplifies the hardwareimplementation. An interference cancellation method is applied whileprojection is avoided. It does not require any performance enhancingsignal processing cores such as the channel pre-processing and/orlattice reduction before the detection core, which results in a lowercomplexity at the receiver.

It will be apparent to persons skilled in the art that a number ofvariations and modifications can be made without departing from thescope of the present disclosure as defined in the claims. The methodsteps may be embodied in sets of executable machine code stored in avariety of formats such as object code or source code. Such code isdescribed generically herein as programming code, or a computer programfor simplification. Clearly, the executable machine code or portions ofthe code may be integrated with the code of other programs, implementedas subroutines, plug-ins, add-ons, software agents, by external programcalls, in firmware or by other techniques as known in the art.

The embodiments may be executed by a computer processor or similardevice programmed in the manner of method steps, or may be executed byan electronic system which is provided with means for executing thesesteps. Similarly, an electronic memory medium such computer diskettes,Digital Versatile Disc (DVD), Compact Disc (CD), Random Access Memory(RAM), Read Only Memory (ROM) or similar computer software storage mediaknown in the art, may be programmed to execute such method steps. Aswell, electronic signals representing these method steps may also betransmitted via a communication network.

The embodiments described above are intended to be illustrative only.The scope of the present disclosure is therefore intended to be limitedsolely by the scope of the appended claims.

The invention claimed is:
 1. A method of performing a linear-complexityoptimal soft Multiple-input-multiple-output (MIMO) detector, the methodcomprising the steps of: calculating first and second generator matricesbased upon a channel matrix; applying the generator matrices to areceived vector to generate first and second modified received vectorswherein the first modified received vector comprises an originaltransmitted vector and the second modified received vector comprises aflipped version of the original transmitted vector; selecting first andsecond elements of the transmitted vector as child and parent symbolsrespectively; determining, for both the transmitted vector and theflipped version of the transmitted vector, for each possible value ofthe parent symbol, a first child; and calculating log-likelihood ratios(LLRs) of all bits for each resulting vector.
 2. The method of claim 1wherein the first element and second element of the transmitted vectorare from different constellation schemes.
 3. The method of claim 2,wherein said different constellation schemes are chosen from a symmetrictwo-dimensional modulation scheme.
 4. The method of claim 1 wherein thechannel pre-processing is implemented by performing a QR-decompositionof the channel matrix, denoted as H=QR, to remove interference betweenthe transmitted symbols.
 5. The method of claim 1 wherein the channelpre-processing is implemented by generating a unitary matrix D as thegenerator matrix in order to triangularize the channel matrix whereinmultiplication of the channel matrix by the D matrix results in amodified channel matrix.
 6. The method of claim 5, wherein the D matrixis a 2.times.2 matrix for a 2.times.2 MIMO system derived from anormalized first column of a channel matrix where its first element is atranspose of a first element in the column, its second element is anegative of a second element in the column, its third element is atranspose of the second element in the column and its fourth element isthe first element in the column.
 7. The method of claim 1 wherein theLLR values of the bits of the second transmitted symbol are calculatedbased on Euclidean distances corresponding to the candidates in thecandidate list of the transmitted vector.
 8. The method of claim 7,wherein the LLR values of the bits of the first transmitted symbol iscalculated based on Euclidean distances corresponding to the candidatesin the candidate list of the flipped transmitted vector.
 9. The methodof claim 4 wherein the LLR values of the bits of the second transmittedsymbol are calculated based on Euclidean distances corresponding to thecandidates in the candidate list of the transmitted vector.
 10. Themethod of claim 9, wherein the LLR values of the bits of the firsttransmitted symbol is calculated based on Euclidean distancescorresponding to the candidates in the candidate list of the flippedtransmitted vector.
 11. The method of claim 5 wherein the LLR values ofthe bits of the second transmitted symbol are calculated based onEuclidean distances corresponding to the candidates in the candidatelist of the transmitted vector.
 12. The method of claim 11, wherein theLLR values of the bits of the first transmitted symbol is calculatedbased on Euclidean distances corresponding to the candidates in thecandidate list of the flipped transmitted vector.
 13. The method ofclaim 1 further comprising the step of providing calculated LLR valuesto a Convolutional Turbo Code (CTC) decoder for decoding.
 14. The methodof claim 4 further comprising the step of providing calculated LLRvalues to a Convolutional Turbo Code (CTC) decoder for decoding.
 15. Themethod of claim 5 further comprising the step of providing calculatedLLR values to a Convolutional Turbo Code (CTC) decoder for decoding. 16.The method of claim 4, wherein the first generator matrix is Q andsecond generator matrix is {tilde over (Q)}, and said first generatorand second generator matrices are square unitary matrices of sizeN.sub.R.times.N.sub.R; and R is an upper triangular N.sub.R.times.2matrix whose last N.sub.R-2 rows are zero.
 17. The method of claim 3,wherein said symmetric two-dimensional modulation scheme is a quadratureamplitude modulation (QAM) scheme.